Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Applications

TABLE 1.2

Experimental results of some famous binary methods on ImageNet.

Methods

Weights Activations

Model

Binarized Acc. Full-precision Acc.

Top-1 Top-5 Top-1

Top-5

XNOR-Net [199]

Binary

ResNet-18 51.2

73.2

69.3

89.2

ABC-Net [147]

Binary

ResNet-50 70.1

89.7

76.1

92.8

LBCNN [109]

Binary

–

62.43¹

–

64.94

–

Bi-Real Net [159]

Binary

ResNet-34 62.2

83.9

73.3

91.3

PCNN [77]

Binary

ResNet-18 57.3

80.0

69.3

89.2

RBCN [148]

Binary

ResNet-18 59.5

81.6

69.3

89.2

BinaryDenseNet [12]

–

62.5

83.9

–

BNAS [36]

–

71.3

90.3

–

1.2.1

Image Classiﬁcation

Image classiﬁcation aims to group images into diﬀerent semantic classes together. Many

works regard the completion of image classiﬁcation as the criterion for the success of

BNNs. Five datasets are commonly used for image classiﬁcation tasks: MNIST [181], SVHN,

CIFAR-10 [122], CIFAR-100 and ImageNet [204]. Among them, ImageNet is the most diﬃ-

cult to train and consists of 100 classes of images. Table 1.2 shows the experimental results

of some of the most popular binary methods on ImageNet.

1.2.2

Speech Recognition

Speech recognition is a technique or capability that enables a program or system to process

human speech. We can use binary methods to complete speech recognition tasks in edge

computing devices.

Xiang et al. [252] applied binary DNNs to speech recognition tasks. Experiments on

TIMIT phone recognition and 50-hour Switchboard speech recognition show that binary

DNNs can run about four times faster than standard DNNs during inference, with roughly

10.0%.

Zheng et al. [290] and Yin et al. [273] also implement binarized CNN-based speech

recognition tasks.

1.2.3

Object Detection and Tracking

Object detection is the process of ﬁnding a target from a scene, while object tracking is the

follow-up of a target in consecutive frames in a video.

Sun et al. [218] propose a fast object detection algorithm based on BNNs. Compared to

full-precision convolution, this new method results in 62 times faster convolutional opera-

tions and 32 times memory saving in theory.

113×13 Filter